Toward sensitive document release with privacy guarantees

نویسندگان

  • David Sánchez
  • Montserrat Batet
چکیده

Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy, as we also illustrate through empirical experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differentially Private Local Electricity Markets

Privacy-preserving electricity markets have a key role in steering customers towards participation in local electricity markets by guarantying to protect their sensitive information. Moreover, these markets make it possible to statically release and share the market outputs for social good. This paper aims to design a market for local energy communities by implementing Differential Privacy (DP)...

متن کامل

Data masking for privacy-sensitive learning

We study the problem of data release with privacy, where data is made available with privacy guarantees while keeping the usability of the data as high as possible. This is important in healthcare and other domains with sensitive data. In particular, we propose a method of masking sensitive parts of private data while ensuring that a learner trained using the masked data is similar to the learn...

متن کامل

مقایسه ی ُمیزان رعایت اصول محرمانگی در موارد قانونی بر مبنای راهنمای سازمان بهداشت جهانی دربیمارستان های آموزشی وابسته به دانشگاه های علوم پزشکی ایران ،تهران و شهیدبهشتی :1387.

Introduction: In many countries, the medical records are important legal documents, essential not only for the present and future care for patients but also as legal documents to protect the patients and the hospitals. Medical record is a confidential document and always the patient's right to privacy must be regarded. Methods: This is a descriptive - cross sectional study. Study sample were 34...

متن کامل

More Flexible Differential Privacy: The Application of Piecewise Mixture Distributions in Query Release

There is an increasing demand to make data “open” to third parties, as data sharing has great benefits in datadriven decision making. However, with a wide variety of sensitive data collected, protecting privacy of individuals, communities and organizations, is an essential factor in making data “open”. The approaches currently adopted by industry in releasing private data are often ad hoc and p...

متن کامل

C-sanitized: a privacy model for document redaction and sanitization

Within the current context of Information Societies, large amounts of information are daily exchanged and/or released. The sensitive nature of much of this information causes a serious privacy threat when documents are uncontrollably made available to untrusted third parties. In such cases, appropriate data protection measures should be undertaken by the responsible organization, especially und...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2017